remotemanager.dataset.dataset module¶
Main Dataset module
This is the primary class used by the user
- class remotemanager.dataset.dataset.Dataset(function: Callable | str | None, url: URL | None = None, dbfile: str | None = None, transport: Transport | None = None, serialiser: serial | None = None, script: str | None = None, shebang: str | None = None, name: str | None = None, extra_files_send: List[str] | str | None = None, extra_files_recv: List[str] | str | None = None, verbose: int | bool | Verbosity | None = None, run_summary_limit: int = 25, add_newline: bool = True, skip: bool = True, extra: str | None = None, **global_run_args)[source]¶
Bulk holder for remote runs. The Dataset class handles anything regarding the runs as a group. Running, retrieving results, sending to remote, etc.
- Parameters:
function (Callable, str, None) – Function to run. Can either be the function object, source string or None If None, Runner will pass arguments to the script method
url (URL) – connection to remote (optional)
transport (Transport) – transport system to use, if a specific is required. Defaults to transport.rsync
serialiser (serial) – serialisation system to use, if a specific is required. Defaults to serial.serialjson
script (str) – callscript required to run the jobs in this dataset
submitter (str) – command to exec any scripts with. Defaults to “bash”
name (str) – optional name for this dataset. Will be used for runscripts
extra_files_send (list, str) – extra files to send with this run
extra_files_recv (list, str) – extra files to retrieve with this run
skip (bool) – skip dataset creation if possible. Defaults True
extra – extra text to insert into the runner jobscripts
global_run_args – any further (unchanging) arguments to be passed to the runner(s)
- classmethod recreate(*args, raise_if_not_found: bool = True, **kwargs)[source]¶
Attempts to extract a dataset matching the given args from the python garbage collection interface
- Parameters:
raise_if_not_found (bool) – raise ValueError if the Dataset was not found
*args – args as passed to Dataset
**kwargs – keyword args as passed to Dataset
- Returns:
Dataset
- property database: Database¶
Access to the stored database object. Creates a connection if none exist.
- Returns (Database):
Database
- property dbfile: str¶
Name of the database file
- sanitise_run_arg_paths(run_args: dict) dict [source]¶
Checks for issues in the paths within the given run_args
- property remote_dir: str¶
Accesses the remote_dir property from the run args. Tries to fall back on run_dir if not found, then returns default as a last resort.
- property run_dir: str | None¶
Accesses the remote_dir property from the run args. Tries to fall back on run_dir if not found, then returns default as a last resort.
- property run_path: str | bool¶
Accesses the remote_dir property from the run args. Tries to fall back on run_dir if not found, then returns default as a last resort.
- property local_dir: str¶
Accesses the local_dir property from the run args. Returns default if not found.
- property repo_prefix: str¶
override for repo names and manifest file in a dependency situation
- property repofile: TrackedFile¶
Returns the TrackedFile instance responsible for the repository
- property bash_repo: TrackedFile¶
Returns the TrackedFile instance responsible for the repository
- property master_script: TrackedFile¶
Returns the TrackedFile instance responsible for the master script
- property manifest_log: TrackedFile¶
Returns the TrackedFile instance responsible for the manifest
- property global_run_args: dict¶
Returns the toplevel global run args
- set_run_arg(key: str, val)[source]¶
Set a single run arg key to val
- Parameters:
key – name to set
val – value to set to
- Returns:
None
- set_run_args(keys: list, vals: list)[source]¶
Set a list of keys to `vals
Note
List lengths must be the same
- Parameters:
keys – list of keys to set
vals – list of vals to set to
- Returns:
None
- update_run_args(d: dict)[source]¶
Update current global run args with a dictionary d
- Parameters:
d – dict of new args
- Returns:
None
- property do_not_recurse: bool¶
Internal function used for blocking recursion in dependency calls
- property dependency: Dependency | None¶
Returns the stored dependency
- property is_child: bool¶
Returns True if this dataset is a child, False otherwise
- property is_parent: bool¶
Returns True if this dataset is a parent, False otherwise
- pack(file: str = None, **kwargs) dict | None [source]¶
Override for the SendableMixin.pack() method, ensuring the dataset is always below a
uuid
- Parameters:
**kwargs – Any arguments to be passed onwards to the SendableMixin.pack()
- Returns:
(dict) packing result
- set_run_option(key: str, val) None [source]¶
Update a global run option key with value val
- Parameters:
key (str) – option to be updated
val – value to set
- append_run(args: dict = None, arguments: dict = None, name: str = None, extra_files_send: list | str | None = None, extra_files_recv: list | str | None = None, dependency_call: bool = False, verbose: int = None, quiet: bool = False, skip: bool = True, force: bool = False, lazy: bool = False, chain_run_args: bool = True, extra: str = None, return_runner: bool = False, **run_args)[source]¶
Serialise arguments for later runner construction
- Parameters:
args (dict) – dictionary of arguments to be unpacked
arguments (dict) – alias for args
name (str) – append a runner under this name
extra_files_send (list, str) – extra files to send with this run
extra_files_recv (list, str) – extra files to retrieve with this run
dependency_call (bool) – True if called via the dependency handler
verbose (int, Verbose, None) – verbose level for this runner (defaults to Dataset level)
quiet (bool) – disable printing for this append if True
skip (bool) – ignores checks for an existing runner if set to False
force (bool) – always appends if True
lazy (bool) – performs a “lazy” append if True, skipping the dataset update. You MUST call ds.finish_append() after you are done appending to avoid strange behaviours
chain_run_args (bool) – for dependency runs, will not propagate run_args to other datasets in the chain if False (defaults True)
extra – extra string to add to this runner
return_runner – returns the appened (or matching) runner if True
run_args – any extra arguments to pass to runner
- insert_runner(runner: Runner, skip: bool = True, force: bool = False, lazy: bool = False, verbose: None | int | bool | Verbosity = None, quiet: bool = False, return_runner: bool = False) None | Runner [source]¶
Internal runner insertion.
- Parameters:
runner – Runner object to insert
skip – don’t insert if it exists
force – force inserts
lazy – Attempts a lazy append if True (does not update DB)
verbose – Verbosity level for this runner
quiet – inserts runner quietly if True
return_runner – Returns the runner object if True
- Returns:
None or Runner
- finish_append(dependency_call: bool = False, print_summary: bool = True, verbose: None | int | bool | Verbosity = None) None [source]¶
Completes the append process by updating the database, and printing a summary if necessary
- Parameters:
dependency_call – Will not attempt to relay to a dependency if True (called by dependency)
print_summary – Prints a summary if True
verbose – verbosity level for this call
- lazy_append() LazyAppend [source]¶
Access a LazyAppend object, which handles the append finalisation
- remove_run(ident: int | str | dict, dependency_call: bool = False, verbose: None | int | bool | Verbosity = None) bool [source]¶
Remove a runner with the given identifier. Search methods are identical get_runner(id)
- Parameters:
ident – identifier
dependency_call (bool) – used by any dependencies that exist, prevents recursion
verbose – local verbose level
- Returns:
True if succeeded
- Return type:
(bool)
- get_runner(ident: int | str | dict, dependency_call: bool = False, verbose: None | int | bool | Verbosity = None) Runner | None [source]¶
Collect a runner with the given identifier. Depending on the type of arg passed, there are different search methods:
int: the runners[ident] of the runner to remove
str: searches for a runner with the matching uuid
dict: attempts to find a runner with matching args
- Parameters:
ident – identifier
dependency_call (bool) – used by the dependencies, runners cannot be removed via uuid in this case, as the uuids will not match between datasets
- Returns:
collected Runner, None if not available
- Return type:
(Runner)
- wipe_runs(dependency_call: bool = False, confirm: bool = True) None [source]¶
Removes all runners
- Parameters:
dependency_call (bool) – used by any dependencies that exist, prevents recursion
confirm (bool) – Asks for confirmation if True
- reset_runs(wipe: bool = False, dependency_call: bool = False, confirm: bool = True) None [source]¶
Remove any results from the stored runners and attempt to delete their result files if wipe=True
Warning
This is a potentially destructive action, be careful with this method
- Parameters:
wipe – Additionally deletes the local files if True. Default False
dependency_call (bool) – used by any dependencies that exist, prevents recursion
confirm (bool) – Asks for confirmation if True
- collect_files(remote_check: bool, results_only: bool = False, extra_files_send: bool = True) list [source]¶
Collect created files
- Parameters:
remote_check – search for remote paths if True
results_only – only collect files that are returned from a run such as Results and extra_files_recv if True
extra_files_send – collects extra_files_send if True
- Returns:
list of filepaths
- wipe_local(files_only: bool = True, dry_run: bool = False, dependency_call: bool = False, confirm: bool = True) None [source]¶
Clear out the local directory
- Parameters:
files_only (bool) – delete individual files instead of whole folders (preserves extra files)
dry_run (bool) – print targets and exit
dependency_call (bool) – used by any dependencies that exist, prevents recursion
confirm (bool) – Asks for confirmation if True
- wipe_remote(files_only: bool = True, dry_run: bool = False, dependency_call: bool = False, confirm: bool = True) None [source]¶
Clear out the remote directory (including run dir)
- Parameters:
files_only (bool) – delete individual files instead of whole folders (preserves extra files)
dry_run (bool) – print targets and exit
dependency_call (bool) – used by any dependencies that exist, prevents recursion
confirm (bool) – Asks for confirmation if True
- hard_reset(files_only: bool = True, dry_run: bool = False, dependency_call: bool = False, confirm: bool = True) None [source]¶
Hard reset the dataset, including wiping local and remote folders
- Parameters:
files_only (bool) – delete individual files instead of whole folders (preserves extra files)
dry_run (bool) – print targets and exit
dependency_call (bool) – used by any dependencies that exist, prevents recursion
confirm (bool) – Asks for confirmation if True
- backup(file=None, force: bool = False, full: bool = False) str [source]¶
Backs up the Dataset and any attached results/extra files to zip file
- Parameters:
file – target path
force – overwrite file if it exists
full – only collects runner results if False (defaults
False
)
- Returns:
path to zip file
- classmethod restore(file, force: bool = False) Dataset [source]¶
Restore from backup file file
- Parameters:
file – File to restore from
force – Set to True to overwrite any existing Dataset
- Returns:
Dataset
- property runner_dict: dict¶
Stored runners in dict form, where the keys are the append id
- property states: List[RunnerState]¶
Runner states as a list of RunnerState
- property string_states: List[str]¶
Runner states as a list of strings
- property extra: str¶
Returns the global level extra
- property shebang: str¶
returns the url shebang
- property script: str¶
Currently stored run script
- Parameters:
sub_args – arguments to substitute into the script() method
- Returns:
arg-substituted script
- Return type:
(str)
- property add_newline: bool¶
Returns True if add_newline is set
This controls if scripts have an additional newline enforced at the end
- property submitter: str¶
Currently stored submission command
- property name: str¶
Name of this dataset
- property uuid: str¶
This Dataset’s full uuid (64 characcter)
- property short_uuid: str¶
This Dataset’s short format (8 character) uuid
- set_runner_states(state: str, uuids: list = None, extra: str = None, force: bool = False) None [source]¶
Update runner states to state
- Parameters:
state ((str)) – state to set
uuids ((list)) – list of uuids to update, updates all if not passed
- check_all_runner_states(state: str) bool [source]¶
Check all runner states against state, returning True if all runners have this state
- Parameters:
state (str) – state to check for
- Returns (bool):
all(states)
- property last_run: int | None¶
Returns the unix time of the last _run call
- Returns:
unix time of last _run call, or None if impossible
- Return type:
(int)
- property run_summary_limit: int¶
If there are more runners than this number, the run output will be summed up rather than printed
- property summary_only: bool¶
Returns True if the number of runners exceeds the summary limit. Otherwise, returns False.
Used for printing a shortened output when running.
- retry_failed(*args, **kwargs) None [source]¶
Retries all failed runners
Takes args and kwargs, passes them to run
- stage(uuids: List[str] = None, force: bool = False, dependency_call: bool = False, extra: str = '', force_ignores_success: bool = False, verbose: Verbosity = None, **run_args) bool [source]¶
Stage all runners, generating all files and preparing for transfer and execution.
Returns a boolean, True if any new content was written.
- transfer(uuids: List[str] = None, force: bool = False, dependency_call: bool = False, extra: str = '', force_ignores_success: bool = False, verbose: Verbosity = None, **run_args) bool [source]¶
Transfer the files to the remote
- run(force: bool = False, dry_run: bool = False, verbose: None | int | bool | Verbosity = None, uuids: list = None, extra: str = '', force_ignores_success: bool = False, dependency_call: bool = False, **run_args) bool [source]¶
Run the functions
- Parameters:
force (bool) – force all runs to go through, ignoring checks
dry_run (bool) – create files, but do not run
verbose – Sets local verbose level
uuids (list) – list of uuids to run
extra – extra text to add to runner jobscripts
failed_only (bool) – If True, force will submit only failed runners
force_ignores_success (bool) – If True, force takes priority over is_success check
dependency_call (bool) – Internally used to block recursion issues with dependencies
run_args – any arguments to pass to the runners during this run. will override any “global” arguments set at Dataset init
- property run_cmd: CMD¶
Access to the storage of CMD objects used to run the scripts
- Returns:
List of CMD objects
- Return type:
(list)
- property is_finished: list¶
Queries the finished state of this Dataset
- property is_finished_force: list¶
Queries the finished state of this Dataset
- property all_finished: bool¶
Check if all runners have finished
- Returns (bool):
True if all runners have completed their runs
- property all_success: bool¶
Returns True if all runners report that they have succeeded
- wait(interval: int | float = 10, timeout: int | float = None, watch: bool = False, success_only: bool = False, only_runner: Runner = None, force: bool = False) None [source]¶
Watch the calculation, printing updates as runners complete
- Parameters:
interval – check interval time in seconds
timeout – maximum time to wait in seconds
watch – print an updating table of runner states
success_only – Completion search ignores failed runs if True
only_runner – wait for only this runner to complete
force – Raises dataset level errors as errors if True
- Returns:
None
- fetch_results(results: bool = True, errors: bool = True, extras: bool = True, force: bool = False, verbose: None | int | bool | Verbosity = None)[source]¶
Fetch results from the remote, and store them in the runner results property
- Parameters:
results – fetch result files
errors – fetch error files
extras – fetch extra files
- Returns:
None
- update_runners(runners: list | None = None, dependency_call: bool = False)[source]¶
Collects the manifest file, updating runners
- Parameters:
runners – list of runners to update, usually used for dependencies
dependency_call – internal flag to avoid dependecy loops
- property results: list¶
Access the results of the runners
- Returns (list):
runner.result
for each runner
- property errors: list¶
Access the errors of the runners
- Returns (list):
runner.error
for each runner
- property failed: list¶
Returns a list of failed runners
- Returns:
list of failed runners